农业面临着劳动危机,导致人们对小型,伪造机器人(AGBOTS)的兴趣增加,这些机器人可以执行精确的,有针对性的行动(例如,农作物侦察,除草,受精),同时由人类操作员进行监督。但是,农民不一定是机器人技术方面的专家,也不会采用增加其工作量的技术或不提供立即回报的技术。在这项工作中,我们探讨了远程人类操作员与多个Agbot之间进行通信的方法,并研究音频通信对操作员的偏好和生产率的影响。我们开发了一个模拟平台,在该平台中,AGBOT在一个字段中部署,随机遇到故障,并呼吁操作员寻求帮助。随着AGBOTS报告错误,测试了各种音频通信机制,以传达哪种机器人失败以及发生了什么类型的故障。人类的任务是在完成次要任务时口头诊断失败。进行了一项用户研究,以测试三种音频通信方法:耳塞,单短语命令和完整的句子通信。每个参与者都完成了一项调查,以确定他们的偏好和每种方法的总体效率。我们的结果表明,使用单个短语的系统是参与者最积极的看法,可以使人更有效地完成次要任务。该代码可在以下网址获得:https://github.com/akamboj2/agbot-sim。
translated by 谷歌翻译
随着车身可穿戴感应技术的发展,人类活动的识别已成为一个有吸引力的研究领域。借助舒适的电子质地,传感器可以嵌入衣服中,以便可以长期记录人类运动。但是,一个长期存在的问题是如何处理通过相对于身体运动引入的运动人工制品。令人惊讶的是,最近的经验发现表明,与刚性连接的传感器相比,与固定的传感器相比,布置的传感器实际上可以实现更高的活动识别精度,尤其是在从短时间窗口中预测时。在这项工作中,引入了概率模型,其中通过织物传感记录的运动之间的统计距离增加了这种提高的准确性和呼吸。模型的预测在模拟和真实的人类运动捕获实验中得到了验证,很明显,这种反直觉效应是紧密捕获的。
translated by 谷歌翻译
量子状态的神经网络表示的变异优化已成功地用于解决相互作用的费米子问题。尽管发展迅速,但在考虑大规模分子时会出现重大的可伸缩性挑战,这些分子与非局部相互作用的量子自旋汉密尔顿人相对应,这些量子旋转汉密尔顿人由数千甚至数百万的保利操作员组成。在这项工作中,我们引入了可扩展的并行化策略,以改善基于神经网络的量子量蒙特卡洛计算,用于AB-Initio量子化学应用。我们建立了由GPU支持的局部能量并行性,以计算潜在复杂分子的哈密顿量的优化目标。使用自回旋抽样技术,我们证明了实现CCSD基线目标能量所需的壁锁定时间的系统改进。通过将最终的旋转汉顿量的结构适应自回归抽样顺序,进一步提高了性能。与经典的近似方法相比,该算法实现了有希望的性能,并且比现有基于神经网络的方法具有运行时间和可伸缩性优势。
translated by 谷歌翻译
B型主动脉解剖(TBAD)是最严重的心血管事件之一,其特征在于每年的年龄发病率,以及疾病预后的严重程度。目前,计算机断层摄影血管造影(CTA)已被广泛采用TBAD的诊断和预后。 CTA中真菌(TL),假腔(FL)和假腔血栓(FLT)的精确分割对于解剖学特征的精确定量,CTA是至关重要的。然而,现有的作品仅关注TL和FL而不考虑FLT。在本文中,我们提出了ImageTBAD,TBAD的第一个3D计算断层造影血管造影(CTA)图像数据集具有TL,FL和FLT的注释。该建议的数据集包含100个TBAD CTA图像,与现有的医学成像数据集相比,这是体面的大小。由于FLT几乎可以沿着主动脉出现具有不规则形状的主动脉,FLT的分割呈现了各种各样的分割问题,其中目标存在于具有不规则形状的各种位置。我们进一步提出了一种用于TBAD的自动分割的基线方法。结果表明,基线方法可以通过现有的主动脉和TL分段实现与现有工作的可比结果。然而,FLT的分割精度仅为52%,这使大型改进室并显示了我们数据集的挑战。为了促进进一步研究这一具有挑战性的问题,我们的数据集和代码将发布给公众。
translated by 谷歌翻译
Masked image modeling (MIM) has shown great promise for self-supervised learning (SSL) yet been criticized for learning inefficiency. We believe the insufficient utilization of training signals should be responsible. To alleviate this issue, we introduce a conceptually simple yet learning-efficient MIM training scheme, termed Disjoint Masking with Joint Distillation (DMJD). For disjoint masking (DM), we sequentially sample multiple masked views per image in a mini-batch with the disjoint regulation to raise the usage of tokens for reconstruction in each image while keeping the masking rate of each view. For joint distillation (JD), we adopt a dual branch architecture to respectively predict invisible (masked) and visible (unmasked) tokens with superior learning targets. Rooting in orthogonal perspectives for training efficiency improvement, DM and JD cooperatively accelerate the training convergence yet not sacrificing the model generalization ability. Concretely, DM can train ViT with half of the effective training epochs (3.7 times less time-consuming) to report competitive performance. With JD, our DMJD clearly improves the linear probing classification accuracy over ConvMAE by 5.8%. On fine-grained downstream tasks like semantic segmentation, object detection, etc., our DMJD also presents superior generalization compared with state-of-the-art SSL methods. The code and model will be made public at https://github.com/mx-mark/DMJD.
translated by 谷歌翻译
Cohn and Umans proposed a framework for developing fast matrix multiplication algorithms based on the embedding computation in certain groups algebras. In subsequent work with Kleinberg and Szegedy, they connected this to the search for combinatorial objects called strong uniquely solvable puzzles (strong USPs). We begin a systematic computer-aided search for these objects. We develop and implement constraint-based algorithms build on reductions to $\mathrm{SAT}$ and $\mathrm{IP}$ to verify that puzzles are strong USPs, and to search for large strong USPs. We produce tight bounds on the maximum size of a strong USP for width $k \le 5$, construct puzzles of small width that are larger than previous work, and improve the upper bounds on strong USP size for $k \le 12$. Although our work only deals with puzzles of small-constant width, the strong USPs we find imply matrix multiplication algorithms that run in $O(n^\omega)$ time with exponent $\omega \le 2.66$. While our algorithms do not beat the fastest algorithms, our work provides evidence and, perhaps, a path to finding families of strong USPs that imply matrix multiplication algorithms that are more efficient than those currently known.
translated by 谷歌翻译
This paper presents a practical global optimization algorithm for the K-center clustering problem, which aims to select K samples as the cluster centers to minimize the maximum within-cluster distance. This algorithm is based on a reduced-space branch and bound scheme and guarantees convergence to the global optimum in a finite number of steps by only branching on the regions of centers. To improve efficiency, we have designed a two-stage decomposable lower bound, the solution of which can be derived in a closed form. In addition, we also propose several acceleration techniques to narrow down the region of centers, including bounds tightening, sample reduction, and parallelization. Extensive studies on synthetic and real-world datasets have demonstrated that our algorithm can solve the K-center problems to global optimal within 4 hours for ten million samples in the serial mode and one billion samples in the parallel mode. Moreover, compared with the state-of-the-art heuristic methods, the global optimum obtained by our algorithm can averagely reduce the objective function by 25.8% on all the synthetic and real-world datasets.
translated by 谷歌翻译
Video-language pre-training has advanced the performance of various downstream video-language tasks. However, most previous methods directly inherit or adapt typical image-language pre-training paradigms to video-language pre-training, thus not fully exploiting the unique characteristic of video, i.e., temporal. In this paper, we propose a Hierarchical Temporal-Aware video-language pre-training framework, HiTeA, with two novel pre-training tasks for modeling cross-modal alignment between moments and texts as well as the temporal relations of video-text pairs. Specifically, we propose a cross-modal moment exploration task to explore moments in videos, which results in detailed video moment representation. Besides, the inherent temporal relations are captured by aligning video-text pairs as a whole in different time resolutions with multi-modal temporal relation exploration task. Furthermore, we introduce the shuffling test to evaluate the temporal reliance of datasets and video-language pre-training models. We achieve state-of-the-art results on 15 well-established video-language understanding and generation tasks, especially on temporal-oriented datasets (e.g., SSv2-Template and SSv2-Label) with 8.6% and 11.1% improvement respectively. HiTeA also demonstrates strong generalization ability when directly transferred to downstream tasks in a zero-shot manner. Models and demo will be available on ModelScope.
translated by 谷歌翻译
Despite excellent performance in image generation, Generative Adversarial Networks (GANs) are notorious for its requirements of enormous storage and intensive computation. As an awesome ''performance maker'', knowledge distillation is demonstrated to be particularly efficacious in exploring low-priced GANs. In this paper, we investigate the irreplaceability of teacher discriminator and present an inventive discriminator-cooperated distillation, abbreviated as DCD, towards refining better feature maps from the generator. In contrast to conventional pixel-to-pixel match methods in feature map distillation, our DCD utilizes teacher discriminator as a transformation to drive intermediate results of the student generator to be perceptually close to corresponding outputs of the teacher generator. Furthermore, in order to mitigate mode collapse in GAN compression, we construct a collaborative adversarial training paradigm where the teacher discriminator is from scratch established to co-train with student generator in company with our DCD. Our DCD shows superior results compared with existing GAN compression methods. For instance, after reducing over 40x MACs and 80x parameters of CycleGAN, we well decrease FID metric from 61.53 to 48.24 while the current SoTA method merely has 51.92. This work's source code has been made accessible at https://github.com/poopit/DCD-official.
translated by 谷歌翻译
The task of referring video object segmentation aims to segment the object in the frames of a given video to which the referring expressions refer. Previous methods adopt multi-stage approach and design complex pipelines to obtain promising results. Recently, the end-to-end method based on Transformer has proved its superiority. In this work, we draw on the advantages of the above methods to provide a simple and effective pipeline for RVOS. Firstly, We improve the state-of-the-art one-stage method ReferFormer to obtain mask sequences that are strongly correlated with language descriptions. Secondly, based on a reliable and high-quality keyframe, we leverage the superior performance of video object segmentation model to further enhance the quality and temporal consistency of the mask results. Our single model reaches 70.3 J &F on the Referring Youtube-VOS validation set and 63.0 on the test set. After ensemble, we achieve 64.1 on the final leaderboard, ranking 1st place on CVPR2022 Referring Youtube-VOS challenge. Code will be available at https://github.com/Zhiweihhh/cvpr2022-rvos-challenge.git.
translated by 谷歌翻译